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Abstract This paper addresses the problem of representing the set of repairs of a 
possibly inconsistent database by means of a disjunctive database. Specifically, the 
class of denial constraints is considered. We show that, given a database and a set 
of denial constraints, there exists a (unique) disjunctive database, called canonical, 
which represents the repairs of the database w.r.t. the constraints and is contained in 
any other disjunctive database with the same set of minimal models. We propose an 
algorithm for computing the canonical disjunctive database. Finally, we study the size 
of the canonical disjunctive database in the presence of functional dependencies for 
both repairs and cardinality-based repairs. 

Keywords Inconsistent databases ■ Incomplete databases • Repairs • Disjunctive 
databases 



1 Introduction 

The problem of managing inconsistent data nowadays arises in several scenarios. How 
to extract reliable information from inconsistent databases, i.e. databases violating 
integrity constraints, has been extensively studied in the past several years. Most of 
the works in the literature rely on the notions of repair and consistent query answer [2] . 
Intuitively, a repair for a database w.r.t. a set of integrity constraints is a consistent 
database which "minimally" differs from the (possibly inconsistent) original database. 
The consistent answers to a query over an inconsistent database are those tuples which 

Cristian Molinaro 

DEIS, Universita della Calabria, 87036 Rende, Italy 
E-mail: cmolinaro@deis.unical.it 

Jan Chomicki 

Department of Computer Science and Engineering, 201 Bell Hall, The State University of New 
York at Buffalo, Buffalo, NY 14260, USA 
E-mail: chomicki@cse.buffalo.edu 

Jerzy Marcinkowski 

Institute of Informatics, Wroclaw University, Przesmyckiego 20, 51-151 Wroclaw, Poland 
E-mail: jma@cs.uni.wroc.pl 



2 



can be obtained by evaluating the query in every repair of the database. Let us illustrate 
the notions of repair and consistent query answer by means of an example. 

Example 1 Consider the following relation r 



employee 



Name 


Salary 


Dept 


john 
john 


50 
100 


cs 
cs 



and the functional dependency / : Name — » Salary Dept stating that each em- 
ployee has a unique salary and a unique department. Clearly, r is inconsistent w.r.t. 
/ as it stores two different salaries for the same employee john. Assuming that the 
database is viewed as a set of facts and the symmetric difference is used to capture 
the distance between two databases, there exist two repairs for r w.r.t. /, namely 
{employ ee{john, 50, cs)} and {employ ee(john, 100, cs)}. The consistent answer to the 
query asking for the department of john is cs (as this is the answer of the query in 
both repairs) , whereas the query asking for the salary of john has no consistent answer 
(as the two repairs do not agree on the answer). 

An introduction to the central concepts of consistent query answering is [8], whereas 
surveys on this topic are [6][5]. 

Inconsistency leads to uncertainty as to the actual values of tuple attributes. Thus, 
it is natural to study the possible use of incomplete database frameworks in this context. 
The set of repairs for a possibly inconsistent database could be represented by means of 
an incomplete database whose possible worlds are exactly the repairs of the inconsistent 
database. 

In this paper, we consider a specific incomplete database framework: disjunctive 
databases. A disjunctive database is a finite set of disjunctions of facts. Its semantics is 
given by the set of minimal models. There is a clear intuitive connection between incon- 
sistent and disjunctive databases. For instance, the repairs of the relation r of Exam- 
ple [T] could be represented by the disjunctive database T> = {employ ee(john, 50, cs) V 
employ ee(john, 100, cs)}, as the minimal models of T> are exactly the repairs of r w.r.t. 
/. Disjunctive databases have been studied for a long time 12,13,15,10 . More recently, 
they have again attracted attention in the database research community because of 
potential applications in data integration, extraction and cleaning [J. Our approach 
should be distinguished from the approaches that rely on stable model semantics of 
disjunctive logic programs with negation to represent repairs of inconsistent databases 

Emm- 

In this paper we address the problem of representing the set of repairs of a database 
w.r.t. a set of denial constraints by means of a disjunctive database (in other words, a 
disjunctive database whose minimal models are the repairs). 

We show that, given a database and a set of denial constraints, there exists a 
unique, canonical disjunctive database which (a) represents the repairs of the database 
w.r.t. the constraints, and (b) is contained in any other disjunctive database having 
the same set of minimal models. We propose an algorithm for computing the canon- 
ical disjunctive database which in general can be of exponential size. Next, we study 
the size of the canonical disjunctive database in the presence of restricted functional 
dependencies. We show that the canonical disjunctive database is of linear size when 
only one key in considered, but it may be of exponential size in the presence of two 
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keys or one non-key functional dependency. Finally, we demonstrate that these results 
hold also for a different, cardinality-based semantics of repairs [14] . 

The paper is organized as follows. In Section [2] we introduce some basic notions 
in inconsistent and disjunctive databases. In Section [3] we present an algorithm to 
compute the canonical disjunctive database and show that this database is contained 
in any other disjunctive database with the same minimal models. In Section [4] we 
study the size of the canonical disjunctive databases in the presence of functional 
dependencies. In Section[5j we investigate the size of the canonical disjunctive databases 
under the cardinality-based semantics of repairs. Finally, in Section [6] we draw the 
conclusions and outline some possible future research topics. 



2 Preliminaries 

In this section we introduce some basic notions of relational, inconsistent, and disjunc- 
tive databases. 



2.1 Relational databases 

We assume the standard concepts of the relational data model. A database is a collec- 
tion of relations. Each relation is a finite set of tuples and has a finite set of attributes. 
The values of each attribute are integers, rationals or uninterpreted constants. Each 
tuple t in a relation p can be viewed as a fact p(t); then a database can be viewed as 
a finite set of facts. 

We say that a database is consistent w.r.t. a set of integrity constraints if it satisfies 
the integrity constraints, otherwise it is inconsistent. In this paper we consider the class 
of denial constraints. A denial constraint is a first-order logic sentence of the following 
form: 

VXi ...X n -bl(^l) A • • . Apn(X n ) A ip(X U . . . ,X n )} 

where the X-i's are sequences of variables, the Pi's are relational symbols and ip is a 
conjunction of atoms referring to built-in, arithmetic or comparison, predicates. Special 
cases of denial constraints are functional dependencies and key constraints. A functional 
dependency is of the form 



"ix 1 x 2 x i x i x 5 -,[p(x u X2,x i )Ap(x 1 ,X3,x 5 )AX2^x 3 ] 

The previous functional dependency can be also stated as X — > Y, where X is the 
set of attributes of p corresponding to li whereas Y is the set of attributes of p 
corresponding to Xi (and A3). A key constraint is of the form 



VA 1 A 2 A 3 -^\p(X 1 ,X 2 )Ap(X 1 ,X 3 )AX 2 ^X 3 ] 

We say that the set of attributes corresponding to X\ is a key. We assume that the 
given set of integrity constraints is satisfiable. 
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2.2 Inconsistent databases 

As it has been already said in the introduction, a repair of a database w.r.t. a set 
of integrity constraints is a consistent database which "minimally" differs from the 
(possibly inconsistent) original database [2j- The symmetric difference is used to cap- 
ture the distance between two databases. Because we consider denial constraints and 
assume that the symmetric difference has to be minimal under set inclusion, repairs 
are maximal consistent subsets of the original database (although in Section [5] we will 
consider cardinality-based repairs, where the cardinality of the symmetric difference is 
minimized). The set of repairs of a database D w.r.t. a set F of denial constraints is 
denoted by repairs(D, F). 

Given a database D and a set F of denial constraints, the conflict hypergraph [9] 
for D and F, denoted by Gd.F, IS a hypergraph whose set of vertices is the set of 
facts of D, whereas the set of edges consists of all the sets {pi(ci), . . . ,p n (c„)} s.t. 
pi(ci), . . . ,pn(c n ) are facts of D which violate together a denial constraint in F, i.e. 
there exist a denial constraint 

VXi ...X n -[pi(Xi) A . . . Apn(X„) A <p(Xi, . . ,,!„)] 

in F and a substitution p s.t. p(Xi) = for i = l..n and <p(ci, ■ ■ ■ , c„) is true. A fact 
t of D is said to be conflicting (w.r.t. F) if it is involved in some constraint violations, 
that is there exists an edge {t,t\, . . . ,t m } (m > 0) in Gd,F- F° r a f ac t t of D, we 
denote by edgesp, p(t) the set of edges of Qd,F containing t, i.e. edges rj p(t) — {e — 
{*,*!,- ...t fc } S e£E}. 

2.3 Disjunctive databases 

A disjunctive database 23 is a finite set of non-empty disjunctions of distinct facts. A 
disjunction containing exactly one fact is called a singleton disjunction. A set M of 
facts is a model of T> if M \= T>; M is minimal if there is no M C M s.t. M \= T>. 
We denote by AAM(T>) the set of minimal models of T>. For a disjunction d £ D, Sd 
denotes the set of facts appearing in d. Given two distinct disjunctions d\ and di in V, 
we say that d\ subsumes di if the set of facts appearing in di is a (proper) subset of 
the set of facts appearing in e?2, i.e. S c i 1 C S^ 2 . Moreover, the reduction of T>, denoted 
by reduction(T>), is the disjunctive database obtained from T> by discarding all the 
subsumed disjunctions, that is 

reduction{T>) — {d \ d G T> A $d' G © s.t. d' subsumes d}. 

Observe that for any disjunctive database D, MM(T>) = MM(reduction{V)). 

2.4 Computational complexity 

We adopt here the data complexity assumption [16] . under which the complexity is 
a function of the number of facts in the database. The set of integrity constraints is 
considered fixed. In this setting, the conflict hypergraph is of polynomial size and can be 
computed in polynomial time. We study the size of a disjunctive database representing 
the set of repairs of a relational database D w.r.t. a set of integrity constraints F as a 
function of the number of facts in D. 
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3 Disjunctive databases for representing repairs 

In this section we propose an algorithm to compute a disjunctive database whose 
minimal models are the repairs of a given database w.r.t. a set of denial constraints. We 
show that the so computed disjunctive database is the canonical one, that is any other 
disjunctive database whose minimal models coincide with the repairs of the original 
database is a superset of the canonical one (containing, in addition, only disjunctions 
which are subsumed by disjunctions in the canonical disjunctive database). 



Algorithm 1 

Input: a database D and a set F of denial constraints 

Output: a disjunctive database whose minimal models are the repairs for D and F 
I: V:=(D 

2 : £>':=£>- {t \ {t} is an edge of Qd,f} 

3 : for each t E D' 

4: Let edges D i F (t) = {e\, . . . , e„} 

5 : V := V U {t V h V . . . V t n \ U e e t and ti ^ t for i = l..n} 

6 : repeat until T> does not change 

7 : for each edge e = {t\ , . . . , t k } in Q D i F 

8 : for each ti V D\ , , tk V G T> s.t. Di is not an empty disjunction and 

Di does not contain any fact t' ^ tj in e, i = 1.. k 
9: V ■.= VVJ{D 1 V ... V D k } 

10 : return reduction(T>) 



We denote by T>(D, F) the disjunctive database returned by Algorithm 1 with the 
input consisting of a database D and a set F of denial constraints. In the second step 
of the algorithm, every fact t s.t. {t} is an edge of the conflict hypergraph is discarded. 

The disjunctions introduced in the step 5 allow us to guarantee that the minimal 
models are maximal (consistent) subsets of D. Intuitively, a disjunction of the form 
t V ti V . . . V tn (which contains one fact from each edge containing t) prevents from 
having a model m of D which contains neither t nor the t^s as in this case m would 
not be maximal. 

The disjunctions introduced in the step 9 allow us to guarantee that the minimal 
models of V{D, F) are consistent w.r.t. F. Specifically, the loop in lines 6-9 is performed 
until D satisfies the following property: for every edge e = {£]_, . . . ,t^\ of the conflict 
hypergraph (k > 1), if there are t\ V D\ , . . . , tj. V € T> s.t. each Dj is not an empty 
disjunction, then {D\ V . . . VDfc} is also in T>. As it is shown in the proof of Theorem[TJ 
this property entails that every minimal model of T> does not contain {ti , . . . , i^}. 
Observe that the loop ends when T> does not change anymore; at each iteration new 
disjunctions are added to T>. Since the number of disjunctions is bounded (if the original 
database has h facts, there cannot be more than 2 h — 1 disjunctions) the algorithm 
always terminates. In the last step of the algorithm, subsumed disjunctions are deleted. 
The following theorem states the correctness of Algorithm 1. 

Theorem 1 Given a database D and a set F of denial constraints, the set of minimal 
models of T>(D, F) coincides with the set of repairs of D w.r.t. F. 
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Proof. Since the the disjunctive database T>(D, F) returned by Algorithm 1 is equal to 
reduction(V) (step 10), then MM(V(D,F)) = MM(T>). First we prove 
(1) repairs(D,F) C MM(T>) and next (2) repairs(D,F) D MM(V). 

(1) Consider a repair r in repairs(D, F). First we show that (a) r is a model of V and 
next (b) that it is a minimal model. 

(a) We prove that r satisfies each disjunction in T> by induction. Specifically, as base 
case we consider the disjunctions introduced in the step 5 of the algorithm, whereas the 
inductive step refers to the disjunctions introduced in the step 9. Suppose by contradic- 
tion that r does not satisfy a disjunction tWti V. . . Wt n introduced in the step 5. Observe 
that edges jji _p(t) C edgesp, p(t) and each edge e in edgesp>p(t) — edgespji p(t) is 
s.t. there is a fact t' 6 e! s.t. {t'} is an edge of Gd,F (clearly, t' r). Since in each 
edge in edgesp>p(t) there is a fact (different from t) which is not in r, then r U {t} is 
consistent, which violates the maximality of r. The inductive step consists in showing 
that r satisfies any disjunction added to V in the step 9 assuming that r satisfies T>. 
A disjunction D\ V ... V D k , where the ZVs are not empty disjunctions, is added to 
V whenever there exist t\ V D\, . . . , t k V D k in V s.t. e = {t\, . . . , t k } is an edge of 
Qd',F, and F>i does not contain any fact t ^ ti in e, for i = l..k. Since r satisfies all 
the disjunctions t\ V D\, ...,^V D k and does not contain some fact tj in e (as e is 
an edge of Gd.F too), it satisfies the disjunction Dj and then D\ V . . . V D k as well. 
Hence r is a model of 2?. 

(b) We now show that r is a minimal model, reasoning by contradiction. Assume that 
there exists a model m! C r and let t be a fact in r but not in vn! . Observe that t is a 
conflicting fact (it cannot be the case that there is a model of V which does not contain 
a non-conflicting fact because the algorithm introduces, in the step 5, a singleton dis- 
junction d for each non-conflicting fact d). Moreover, as r is a repair, t is s.t. {t} is not 
an edge of Qd,f and then t is in D 1 . For each edge in edgesp,: ,p(t) — {ei, . . . , e n } 
there is a fact ti ^ t which is not in r as it is consistent and edgesp)/ p(t) C edgesp, p(t). 
The same holds for m' as it is a subset of r. Then, the disjunction t V t\ V . . . V t n in 
X> (added in the step 5) is not satisfied by m', which contradicts that m! is a model. 
Hence r is a minimal model of 2?. 

(2) Consider a minimal model m in MM(T>). We show first (a) that it is consistent 
w.r.t. F and then (b) that it is maximal. 

(a) First of all, it is worth noting that T> doesn't contain a singleton disjunction t s.t. t is 
a conflicting fact of D. This can be shown as follows. Two cases may occur: either {t} is 
an edge of Gd,F or it is not. As for the first case, since we have proved above that each 
repair of D and F is a model of D and no repair contains t, it cannot be the case that t 
is a singleton disjunction of T>. Let us consider the second case. For any conflicting fact t 
in D s.t. {t} is not an edge of Gd,F, there exist a repair r\ s.t. t € r\ and a repair ri s.t. 
t T2- As we have proved above, there are two minimal models of V corresponding to 
r\ and T2, then it cannot be the case that t € T>. We prove that m is consistent w.r.t. F 
by contradiction, assuming that m contains a set of facts t\, . . . , t k s.t. e = {ti, . . . , t k } 
is in Gd,F- Let St t = {D \ t^VD £ T> and D^D does not contain any fact t' / ti in e} 
for i = l..fe. Two cases may occur: either (a) there is a set St 4 which is empty or (b) 
all the sets are not empty, (a) Let tj be a fact in e s.t. Stj is empty. It is easy 
to see that m — {tj} is a model, which contradicts the minimality of m. (b) For each 
Di € , . . . , D k £ St k , it holds that Di V . . . V D k G T>. Then there is a set S tj s.t. 
m satisfies each D in , otherwise it would be the case that some D\ V . . . V Dfc in 
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D, where Di is in Sti for i = l..k, is not satisfied. It is easy to see that m — {tj} is a 
model, which contradicts the minimality of m. Hence m is consistent w.r.t. F. 
(b) Now we prove that m is a maximal (consistent) subset of D reasoning by contra- 
diction, thus assuming that there exists m D m which is consistent. Let t be a fact in 
m' but not in m. Since m! is consistent, for each edge ei in edges pi p{t) — {ei, . . . , e n } 
there is a fact ti^t which is not in m' . The same holds for m as it is a (proper) subset 
of ml . This implies that m doesn't satisfy the disjunction £ V ti V . . . V t n in D (added 
in the step 5), thus contradicting the fact the m is a model. Hence m is a maximal 
consistent subset of D, that is a repair. □ 

Given a database D with n facts, a rough bound on the size of T>(D, F) is that 
it cannot have more than 2™ — 1 disjunctions and each disjunction contains at most 
n facts, for any set F of denial constraints (in the next section we will study more 
precisely the size of T>(D, F) for special classes of denial constraints, namely functional 
dependencies and key constraints). 

The following theorem allows us to identify all the disjunctive databases which have 
the same minimal models of a given disjunctive database. Specifically, it states that 
given a disjunctive database T>, any other disjunctive database with the same minimal 
models is a superset of reduction(T>) containing in addition only disjunctions subsumed 
by disjunctions in reduction(T>). This result allows us to state that there is a (unique) 
disjunctive database representing the repairs for a given database and a set of denial 
constraints which is contained in any other disjunctive database with the same set of 
minimal models. We call such a disjunctive database canonical. Algorithm 1 computes 
the canonical disjunctive database (see Corollary Q} . 

Theorem 2 Given a disjunctive database T), the set 1Z of all disjunctive databases 
having the same minimal models as T> is equal to: 

TZ= {V 1 | reduction(T>) CD' A 

Vd 6 T> — reduction{V) 3d £ reduction(T>) which subsumes d'} 

Proof. We denote by S(D) the set of all the disjunctive databases whose minimal 
models are MM(D). In order to prove that 1Z = 5(7?), first we show that (1) each 
disjunctive database in 1Z is also in <S(D) and next that (2) each disjunctive database 
in <S(D) is in 1Z too. 

(1) Consider a disjunctive database T>' in 1Z. It is easy to see that reduction{T>') — 
reduction(T>). As a disjunctive database and its reduction have the same minimal mod- 
els, MM(V) = MM(V) and hence V 1 is in S{V). 

(2) We show that any disjunctive database not belonging to 1Z is not in S(T>). We 
recall that for a disjunction d, 5^ denotes the set of facts appearing in d. Consider a 
disjunctive database T> ou t which is not in TZ. Two cases may occur: (a) reduction^) % 
T^out or (b) reduction{T>) C T> ou t and 3d' € T> ou t — reduction(V) s.t. there is no 
d 6 reduction{T>) which subsumes d'. 

(a) As reduction(T>) % V ou t, there is a disjunction a in reduction(T>) which is not in 
T^out- Two cases may occur: 

— there exists a\ £ T> ou t which subsumes a; 

— the previous condition does not hold. 
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Let us consider the first case and let I be the interpretation S — Sai where S is 
the set of facts appearing in reduction(T>). It is easy to see that / is a model of 
reduction(D) (the only disjunctions that I could not satisfy are those ones that con- 
tain only facts in S ai ; such disjunctions are not in reduction(D) as they subsume a and 
reduction(D) does not contain two disjunctions s.t. one subsumes the other). Then, 
there exists M C I which is a minimal model of reduction(T>) . As 01 £ T> ou t, each 
model of T> ou t contains a fact in S ai , then M is not a minimal model of T> ou t and so 
MM(r-eduction(V)) ± MM(V out ). Hence V out g S(V). 

We consider now the second case. We show that T> ou t S(T>) in a similar way to the 
previous case. Let I be the interpretation S — Sa where S is the set of facts appearing 
in T> ou t- It is easy to see that 7 is a model of T> ou t (the only disjunctions that I could 
not satisfy are those ones which contain only facts in S a \ such disjunctions are not 
in V ou t as T> ut contains neither a nor a disjunction which subsumes a). Then, there 
exists MCI which is a minimal model of T> ou t- As a 6 reduction{T>), each model of 
reduction(T>) contains a fact in S a , then M is not a minimal model of reduction(D); 
hence T> out g S(T>). 

(b) Let I be the interpretation S — S^i where S is the set of facts appearing in 
reduction^). It is easy to see that 7 is a model of reduction(T>) (the only disjunctions 
that I could not satisfy are d' and those ones which subsume d'). Then, there exists 
M C I which is a minimal model of reduction(D) . As d' £ T> ou t, each model of T> ut 
contains a fact in 5^', then M is not a minimal model of T> ou t \ hence T> ou t $ S(T>). □ 

Corollary 1 Given a database D and a set F of denial constraints, then T>(D, F) is 
the canonical disjunctive database whose minimal models are the repairs for D and F. 

Proof. Straightforward from Theorem [1] and [2] □ 

From now on, we will denote by T> m i n (D,F) the canonical disjunctive database 
whose minimal models are the repairs for a database D and a set F of denial constraints. 
Whenever D and F are clear from the context, we simply write T> m i n instead of 
V mrn (D,F). 



4 Functional dependencies 

In this section we study the size of the canonical disjunctive database representing the 
repairs of a database in the presence of functional dependencies. Specifically, we show 
that when the constraints consist of only one key, the canonical disjunctive database 
is of linear size, whereas for one non-key functional dependency or two keys the size of 
the canonical database may be exponential. 

We observe that in the presence of only one functional dependency, the conflict hy- 
pergraph has a regular structure that "induces" a regular disjunctive database which 
can be identified without performing Algorithm 1. When two key constraints are con- 
sidered, we are not able to provide such a characterization; this is because the conflict 
hypergraph can have an irregular structure and it is harder to identify a pattern for 

Given a disjunction d, we denote by ||d|| the number of facts occurring in d. The 
size of a disjunctive database T>, denoted as ||2?||, is the number of facts occurring in 
it, that is ||D|| = Eden W d \V We stud y the 

size ll^minll of l^tnin &s a, function of the 



9 



size of the given database. 

One key. Given a relation r and a key constraint k stating that the set X of attributes 
is a key of r, we denote by cliques(r,k) the partition of r into n = |7r x (r)| sets 
Ci, . . . , C n , called cliques, s.t. each Ci does not contain two facts with different values 
on X. Observe that (i) facts in the same clique are pairwise conflicting with each other, 
(ii) the set of repairs of r w.r.t. k is {{ii, . . . , in} | ti € C\ for i — l..n}. 

Proposition 1 Given a relation r and a key constraint k, then T) m in is equal to 

{*! V . . . V tm | 3C ={ti,... , t m } G cliques(r, k)} 

Proof. It is straightforward to see that the minimal models of the disjunctive database 
reported above are the repairs of r w.r.t. k; since it coincides with its reduction, The- 
orem [2] implies that it is the canonical one. □ 

It is easy to see that when one key constraint is considered, [|2? TO , n || = \r\. 

Proposition 2 Given a relation and a key constraint, T> m i n is computed in polyno- 
mial time by Algorithm 1. 

Proof. It is easy to see that after the first loop (steps 3-5) Algorithm 1 produces T> m i n 
and, after that, step 9 is never performed. □ 

Two keys. We now show that, in the presence of two key constraints, X> m j„ may have 
exponential size. Let D n (n > 0) be the family of databases, containing 3n facts, of 
the following form: 





A 


B 


tu 


a 


h 


tnl 


a 


bn 


tu 


ai 


bi 


*13 


ai 


b'l 


tn2 


a n 


bn 


t n 3 


a n 


b'n 



Let D S D n and A, B be two keys. The conflict hypergraph for D w.r.t. the two key 
constraints consists of the following edges: 

{{til,tjl} l<i,j<nAi/j}U {{ta,t i2 } | 1 < i < n} U {{t l2 ,t l3 } | 1 < i < n} 

Thus, the conflict hypergraph contains a clique {tu, . . . of size n and, moreover, 

tu is connected to which is in turn connected to t{s (i = l..n). 

Example 2 The conflict hypergraph for a database in D4, assuming that A and B are 
two keys, is reported in Figure 1. 

The following proposition identifies the canonical disjunctive database for a database 
in D n for which A and B are keys; such a disjunctive database has exponential size. 
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Fig. 1 Conflict hypergraph for a database in D4 w.r.t. A, B key constraints 

Proposition 3 Consider a database D in D n and a set of constraints F consisting of 
two keys, A and B. Then Drain is equal to T> where 

V = {t i2 Vti3 \l<i<n} 1) {tii V t i2 V \J t z I 1 < i < n At z £ {t zl ,t z3 }} 

2=1. .71 A z^i 

Proof. First of all, we show that the minimal models of T> are the repairs of D w.r.t. 
F; in particular we prove that (1) MM{V) C repairs(D, F) and (2) MM{T>) D 
repair s(D, F). 

(1) Consider a minimal model m £ A4A4(D). First we show that (a) m is consistent 
w.r.t. F and next (b) that it is maximal. 

(a) Let E be the set of edges of Qjj f- First we show that for each e = {t , t } in E 
and pair of disjunctions d' = t' V D' , d" = t" V D" in 2? s.t. D' (resp. D") does not 
contain t (resp. t'), there is a disjunction in T> which is equal to or subsumes D' V D" ; 
next we show that this property implies that m is consistent w.r.t. F. We recall that 
E is the union of the following three sets: 

Ei = {{Uutj!} I 1 < i,j < n Ai^j} 

E 2 = {{tii,t i2 } I l<i<n} 

E 3 = {{k2,k3} \ l<i<n} 

Let us consider the case where e £ E\, that is e = {tn,tji} (1 < i, j < n A i ^ j). 
Then a disjunction in T> containing tn but not tji is of the form 

d[ : tn V t l2 V t j3 V \J t' z 

z—l..n A z^i,j 

where t' z £ {t z i, ^3}, or of the form 

d'2 ■ t hl V t h2 V f»i V t j3 V \] t' z 

Z —l..n A Z ^h,i,j 

where 1 < h < n A /i^ i,j and t z £ {^21,^23}- Likewise, a disjunction in T> that 
contains tji but not tn is of the form 

d!( : tji V tj-a Vt t3 V Y t" 

2 — l..n A z^i,j 
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where t" £ {tzi,t Z 3}, or of the form 

4 : tfci V t k2 V t jx V t i3 V V 4' 

z— l..n A z=£k,i,j 

where 1 < fc < n A k ^ i,j and t" £ {t z i,t Z 3}- In all the four possible cases, there is 
disjunction in T> which subsumes D' V D": 

— if d! = d'i and d" — d'{, then there exist both tj 2 V tj3 and ti2 V ^3 in T> which 
subsume D 1 V D"; 

— if d' = d'i and d" = d 2 , then there exists t i2 V ti3 in V which subsumes D' V D ; 

— if d' = d' 2 and d" = d" , then there exists tj2 V tj-3 in V which subsumes D 1 V D"; 

— if d' = d' 2 and d" = d' 2 ' , then both t/ji V t h2 V ^3 V tj 3 V Vz=i n a z#/i i j ^ an d 
£fcl V tfe2 V ^^3 V tj 3 V Vz=i n a z^k i j i which are in D, subsume D' V D . 

Let us consider the case where e £ E 2 , namely e = {^1,^2} (1 < i < n). A disjunction 
containing tji but not t i2 is of the form 

tkl V t fe2 V tii V V t z 

z—l..n A z^£i,k 

where 1 < k < n A k^i and t z £ {t z i,t Z 3}, whereas a disjunction containing ti 2 but 
not is °f the form ti 2 V ^3. Thus, D' V D", which is equal to 

tkl v t k2 V t i3 v \J t z 

z—l..n A z^i,k 

is in £>. Finally, consider the last case where e £ S3, that is e = {ti2,ti3} (1 < * < 
A disjunction containing t i2 but not ^3 is of the form 

Ul V * i2 v V t 2 

z— l..n A Z ^z 

where t z £ {t z i, t Z 3}, whereas a disjunction containing ^3 but not tj2 is of the form 

t h i v t h2 v Us v V t" 

z— l..n A z^h,i 

where 1 < h < n A h ^ i and t" £ {t z i,t Z 3}; D V D is subsumed or equal to the 
disjunction 

thl V t h2 V in V V 4' 

2=1. .71 A z^h.i 

which is in V. 

Assume by contradiction that m is not consistent. Then there are two facts t a , tft £ 
m s.t. {t a ,tfe} £ E. Let S't a = {D \ t a V D £ X? and D does not contain tj,} and 
St b = {-D I t;, V D £ £> and D does not contain t a }. As we have seen before, both 
these sets are not empty. We have previously proved that for each D a £ St a and 
-D;, £ St b there is a disjunction in T> which equals or subsumes D a V D . Then, there 
is a set St x among St a and St h s.t. m satisfies each D in St x , otherwise there would 
be D a £ St a , D b £ St b and a disjunction in V which is equal to or subsumes D a V D b 
which is not satisfied by m. Consider the interpretation m! = m — {t x } and let t y 
be the fact among t a and t which is not t x - We now show that m' is a model, that 
contradicts the minimality of m. Clearly, m! satisfies every disjunction in V which does 
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not contain t x - As for the disjunctions in T> containing t x , it is easy to see that they 
are satisfied by m': disjunctions containing ty are satisfied since ty £ m , disjunctions 
not containing t y are satisfied as well since m satisfies every disjunction in St^ ■ Hence 
m is consistent w.r.t. F. 

(b) Now we prove that m is a maximal (consistent) subset of D. First of all, we note 
that for each fact t G D there is a disjunction t V ti V . . . V t n in T) s.t. t\,...,t n are 
facts conflicting with t: 

— for the facts i,2 an d (i — l..n) such disjunctions are V t^; 

— for the facts tn (i = l..n) such disjunctions are tn V t,2 V V«=i n a z^i ^zl- 

Assume by contradiction that m is not a maximal (consistent) subset of D. Then there 
exists m D m which is consistent. Let ( be a fact in m but not in m. Since m 
is consistent, each fact conflicting with t is not in m and, thus, neither in m. This 
implies that m doesn't satisfy the disjunction t V t\ V . . . V t n containing t and some 
fact conflicting with it: the fact that m is a model is contradicted. 
(2) Consider a repair r for D and F. We show first (a) that r is a model of T> and next 
(b) that it is a minimal model. 

(a) Suppose by contradiction that r is not a model of T>, then there is a disjunction 
d £D which is not satisfied by r. Specifically, d is either of the form t^ ^Jta (1 < i < n) 
or tn v *i2VV z =i..„ A z ^tz,l <i <n and t a G {t z i,t z3 }. In the former case, rU{£ i3 } 
is consistent, since the only fact conflicting with t^, namely t^j i s n °t m r - This con- 
tradicts the maximality of r. As for the latter case, let T3 = {tjs | tj$ appears in d}. 
For each tj% g T3 we have that g r, because r does not contain tj% and tj3 is 
conflicting only with tj2 (if ij2 was n °t in r, then r would not be maximal). Then for 
each tj3 g T3, since r contains tj2, it does not contain tj\ otherwise it would not be 
consistent. Thus r does not contain any fact with 1 < k < n A k 7^ i. Since r 
contains neither the facts tfci's nor ti2, which are all the facts conflicting with tn, then 
r U {tn} is consistent (observe that tn r). This contradicts the maximality of r. 
Hence r is a model of T>. 

(b) We now show that r is a minimal model of T> reasoning by contradiction. Assume 
that there exists a model m' C r of T> and let i be a fact in r but not in m . All the 
facts conflicting with t are not in r as r is consistent. The same holds for m since 
it is a (proper) subset of r. We recall that for each fact t g D there is a disjunc- 
tion in T> containing t and only facts conflicting with t ; then there is a disjunction 
d : t V t± V . . . V t n in T> s.t. ii, . . . , t n are facts conflicting with t. Since m does not 
satisfy d, it is not a model, thus we get a contradiction. Hence r is a minimal model of T>. 

We have shown that the minimal models of T> are the repairs of D w.r.t. F. Since T> = 
reduction(D), from Theorem [2] we have that T> is the canonical disjunctive database 
whose minimal models are the repairs of D w.r.t. F. □ 

Corollary 2 Consider a database D in D n and let A and B be two keys; ||1> TO j n || = 
2n + (rt + l) ■n2" _1 . 

Proof. From Proposition [3] it is easy to see that T> m i n contains n disjunctions of 2 
facts and n2 n ~ 1 disjunctions of n + 1 facts. □ 

One functional dependency. Given a relation r and a functional dependency / : 
X — > y, we denote by cliques(r, f) the partition of r into n = \tt x (r)| sets C\, . . . , C n , 
called cliques, s.t. each Ci does not contain two facts with different values on X. For 
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each clique Gj in cliques(r, f) we denote by clusters(Ci) the partition of Cj into 
m i = (Cj)l se ts Gi, . . . , Grrii , called clusters, s.t. each cluster doesn't contain two 
facts with different values on Y. ft is worth noting that (i) facts in the same cluster are 
not conflicting each other, (ii) given two different clusters G\, Gi of the same clique, 
each fact in Gi (resp. G2) is conflicting with every fact in G2 (resp. Gi), (iii) the set 
of repairs of r w.r.t. / is {G\ U . . . U G n | Gi £ clusters(Ci) for i = l..n}. 

Proposition 4 Given a relation r and a functional dependency f, then T> m i n is equal 
to T> where 

V = {tx V . . . V tf. I 3C £ cliques(r, f) s.t. clusters(C) = {Gi, G k } 
and t\eGx,... ,t k £ G k } 

Proof. We show first (f) that each minimal model of f is a repair for r and / and 
next (2) that each repair of r w.r.t. / is a minimal model of T>. 

(1) Consider a minimal model m of T>. Let cliques(r, f) = {C\, . . . , C n } be the cliques 
for r and /. For each clique Gi in cliques(r, f) there is a cluster Gj in clusters(Ci) = 
{Gi, . . . , Gfc} s.t. Gj C m (otherwise m would not satisfy the disjunction t\ V . . . V t k 
in T> where t^ £ G^ and g - m, h = l..k). Let G\, . . . , G n be such clusters, where 
each G; is a cluster of C[ for I = l..n. Since G\ U . . . U G n C m and Gi U . . . U G n \= T), 
then m = G\ U . . . U G n , which is, as we have observed before, a repair. 

(2) Consider a repair s in repairs(r, /). As s consists of one cluster for each clique, 
it is easy to see that s is a model of D. We show that s is minimal by contradiction 
assuming that there exists s' C s which is a model of T>. Let f be a fact in s which 
is not in s' . Let Ct and Gt be the clique and the cluster, respectively, containing t; 
moreover let clusters(Ct) = {Gt, G\, . . . , G k }. The disjunction t Vti V . . . V where 
ti £ Gi, i — l..k, which is in T>, is not satisfied by s' as s' contains exactly one cluster 
per clique (thus it does not contain any fact in Gi, i = l..k) and does not contain t. 
This contradicts the fact that s' is a model. So s is a minimal model of T>. 

Hence the minimal models of T> are exactly the repairs for r and /; as D is equal to its 
reduction, Theorem [2] entails that T> = T> m i n . □ 

Clearly, the size of T> m i n may be exponential if the functional dependency is a 
non-key dependency, as shown in the following example. 

Example 3 Consider the relation r, consisting of 2n facts, reported below and the non- 
key functional dependency A — > B. 



A 


B 


C 


a 


h 


Cl 


a 


h 


C2 


a 


bn 


Cl 


a 


bn 


C2 



There is a unique clique consisting of n clusters Gi = {^,^'}, * = l..n. Then T>, 
{tx V . . . V t n I ti £ G 4 for i = l..n} and ||£> m i„|| = n2 n . 
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5 Cardinality-based repairs 

In this section we consider cardinality-based repairs, that is consistent databases which 
minimally differ from the original database in terms of the number of facts in the 
symmetric difference (in the previous sections we have considered consistent databases 
for which the symmetric difference is minimal under set inclusion, we will refer to them 
as S-repairs). 

We show that, likewise to what has been presented in Section [4] the size of the 
canonical disjunctive database (representing the cardinality-based repairs) is linear 
when only one key constraint is considered, whereas it may be exponential when two 
keys or one non-key functional dependency are considered. 

It is easy to see that in the presence of only one key constraint the cardinality-based 
repairs coincide with the S-repairs, so the canonical disjunctive database is of linear 
size. 

When the constraints consists of one functional dependency, it is easy to see that 
if for every clique its clusters have the same cardinality, then the cardinality-based 
repairs coincide with the S-repairs. This is the case for the database of Example [3] 
where the size of the canonical disjunctive database is exponential. 

Finally, we consider the case where two key constraints are considered. We directly 
show that the size of the canonical disjunctive database is also exponential. 

Lemma 1 Consider a database D in D n and a set of integrity constraints F consisting 
of two keys, A and B . Then the set of S-repairs is is equal to R where 

R = {{t 12 , . . .,t n2 }} U { {t a ,t i3 } u U {tj} | 1 < i < n A tj £ {t j2 ,t j3 }} 

3=1. .Ti A j^i 

Proof. It is easy to see that each database in R is a S-repair. 

Consider a S-repair r of D w.r.t. F. We show that r is in R using reasoning by cases: 

— Suppose that ti 3 £ r. Then ti 2 r and either (1) t\\ G r or (2) t\\ G' r. 

1. Since t\\ G r, for j = 2..n tji r and either tj 2 or tj 3 is in r, that is r = 
{^11)^13)^2, • ■ • >tn} where tj £ {tj 2 ,tjs}, j = 2..n. It is easy to see that r £ R. 

2. Since tn r, there exists t^i G r with 2 < k < n. Then t^ 2 r and t^ G r. 
For j — 2. .n A j 7^ k, tj% (jL r and either tj 2 or tj$ is in r, that is r = 

{*13.*fcli*fc3}UUi=2..7i A where *j G {*?2,*j3}- Clearly, r £ R. 

— Suppose that t\s $ r. Then ti 2 6 r and t\\ r. Two cases may occur: either (1) 
there exists G r with 2 < k < n or (2) tji r for j — l..n. 

1. Since t^i £ r then t^ 2 ^ r and t^ £ r. For j — 2..n A j 7^ k tji (jL r and 
either t j2 or t j3 is in r, that is r = {*12, tfci,*k3} U Uj=2..n a j^ki 1 ]} where 
tj G {tj 2 ,tjs}. It is easy to see that r £ R. 

2. r = {£12, • ■ • ,i n 2} which is in R. □ 

Corollary 3 Consider a database D in D n and a set of integrity constraints F con- 
sisting of two keys, A and B. Then the set of cardinality-based repairs is 

{ {ta,t i3 } U (J {tj} I 1 < i < n A tj £ {t j2 ,t j3 }} 

j=l..n A j^i 



Proof. Straightforward from Lemma [T] 



□ 
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The following proposition identifies the canonical disjunctive database for a database 
in D n for which A and B are keys; such a disjunctive database is of exponential size. 
In the following proposition and corollary, T> m i n denotes the canonical disjunctive 
database representing the set of cardinality-based repairs. 

Proposition 5 Consider a database D in D n and a set of integrity constraints F 
consisting of two keys, A and B. Then the canonical disjunctive database D m i n is 
equal to T) where 

V = {t i2 Vt i3 | 1 < i < n} U{ti V... Vtn| k £ {%,t i3 }, i = !■•"} 

Proof. We first show that (1) each cardinality-based repair of D w.r.t. F is a minimal 
model of T> and next that (2) each minimal model of T) is a cardinality-based repair. 

(1) Consider a cardinality-based repair r of D w.r.t. F. We show first that (a) r is a 
model of D and next that (b) it is a minimal model. 

(a) From Corollary [3] it is easy to see that r satisfies each disjunction tjj V ^3 in T>, 
1 < i < n. Since Corollary [3] entails that there exists 1 < j < n s.t. {tji,tjs} C r, then 
r satisfies each disjunction t\ V . . . V t n in T> (where ti £ {til, £13}, i = l..n). Thus r is 
a model of T>. 

(b) We observe that for each fact t € D there is a disjunction t V t\ V . . . V tn in 
T> s.t. t\,...,t n are facts conflicting with t: for the facts tii and tis (i — l..n) such 
disjunctions are ti2\/tis; for the facts tn (i = l..n) there is the disjunction V. . .Vt n x- 
In the same way as in Proposition [3] it can be shown that r is a minimal model of T). 

(2) Consider a minimal model m of T>. The fact that m is a S-repair of D w.r.t. F can 
be shown in the same way as in Proposition [3] 

It is easy to see that {t\2, ■ ■ ■ ,^2} is not a model of T) and then, from Lemma [1] and 
Corollary |31 m is a cardinality-based repair of D w.r.t. F. 

We have shown that T> represents the cardinality-based repairs of D w.r.t. F; since 
T> — reduction(T>) , from Theorem [2] we have that T> is the canonical one. □ 

Corollary 4 Consider a database D in D n and let A and B be two keys; \\T> m i n \ \ = 
2n + n2 n . 

Proof. From Proposition [5] it is easy to see that T> m i n contains n disjunctions of 2 
facts and 2™ disjunctions of n facts. □ 

6 Conclusions 

In this paper we have addressed the problem of representing, by means of a disjunc- 
tive database, the set of repairs of a database w.r.t. a set of denial constraints. We 
have shown that, given a database and a set of denial constraints, there exists a unique 
canonical disjunctive database representing their repairs: any disjunctive database with 
the same set of minimal models is a superset of the canonical one, containing in ad- 
dition disjunctions which are subsumed by the disjunctions in the canonical one. We 
have proposed an algorithm to compute the canonical disjunctive database. We have 
shown that the size of the canonical disjunctive database is linear when only one key is 
considered, but it may be exponential in the presence of two keys or one non-key func- 
tional dependency. We have shown that these results hold also when cardinality-based 
repairs are considered. 
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Future work in this area could explore different representations for the set of re- 
pairs. For instance, one can consider formulas with negation or non-clausal formulas. 
Such formulas can be more succinct than disjunctive databases, making query evalu- 
ation, however, potentially harder. We also observe that in the case of the repairs of 
a single relation the resulting disjunctive database consists of disjunctions of elements 
of this relation. It has been recognized that such disjunctions should be supported by 
database management systems [I]. Moreover, one could consider restricting inconsis- 
tent databases in such a way that the resulting repairs can be represented by relational 
databases with OR-objects [12]. In this case, one could use the techniques for comput- 
ing certain query answers over databases with OR-objects [13] to compute consistent 
query answers over inconsistent databases. Finally, other kinds of representations of 
sets of possible worlds, e.g., world-set decompositions [1,, should be considered. For ex- 
ample, the set of repairs of the database in Example[3]can be represented as a world-set 
decomposition of polynomial size. 
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